Succinct Dictionary Matching with No Slowdown
نویسنده
چکیده
The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T | + occ) using a data structure that occupies O(m logm) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log σ+O(1))+ O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T |+ occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log σ) while answering queries in O(|T | log logn + occ) time. In this paper we also show how the space occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ < m for any constant 0 < ε < 1. The query time remains unchanged.
منابع مشابه
Succinct 2D Dictionary Matching with No Slowdown
The dictionary matching problem seeks all locations in a given text that match any of the patterns in a given dictionary. Efficient algorithms for dictionary matching scan the text once, searching for all patterns simultaneously. This paper presents the first 2-dimensional dictionary matching algorithm that operates in small space and linear time. Given d patterns, D = {P1, . . . , Pd}, each of...
متن کاملSuccinct Online Dictionary Matching with Improved Worst-Case Guarantees
In the online dictionary matching problem the goal is to preprocess a set of patterns D = {P1, . . . , Pd} over alphabet Σ, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching pro...
متن کاملDesign of Practical Succinct Data Structures for Large Data Collections
We describe a set of basic succinct data structures which have been implemented as part of the Succinct library, and applications on top of the library: an index to speed-up the access to collections of semi-structured data, a compressed string dictionary, and a compressed dictionary for scored strings which supports top-k prefix matching.
متن کاملDynamic 2D Dictionary Matching in Small Space
The dictionary matching problem preprocesses a set of patterns and finds all occurrences of each of the patterns in a text when it is provided. We focus on the dynamic setting, in which patterns can be inserted to and removed from the dictionary, without reprocessing the entire dictionary. This article presents the first algorithm that performs dynamic dictionary matching on two-dimensional dat...
متن کاملApplications of Succinct Dynamic Compact Tries to Some String Problems
The dynamic compact trie is a fundamental data structure for a wide range of string processing problems. In this paper, we report our recent work on succinct dynamic compact tries that stores a set of strings of total length n in O(n log σ) space supporting pattern matching and insert/delete operations in O((|P |/α)f(n)) time, where P is a pattern string, α = Θ(logσ n), and f(n) = O((log logn) ...
متن کامل